Support vector machine fusion of idiolectal and acoustic speaker information in Spanish conversational speech

نویسندگان

  • Daniel Garcia-Romero
  • Julian Fiérrez
  • Joaquín González-Rodríguez
  • Javier Ortega-Garcia
چکیده

This paper proposes a Support Vector Machine (SVM) based combining scheme that incorporates ideolectal and acoustic characteristics for speaker recognition. Two statistical model paradigms, namely GMM for acoustic modeling and Bigrams for language modeling, provide multilevel speaker information that affords a better classification performance when SVM-based fusion is accomplished. This combining approach is useful for all speaker recognition tasks where a considerable amount of data is available. Motivated by the absence of Spanish databases that made feasible our research experiments, more than nine hours of Spanish conversational speech was collected and manually transcribed from broadcasted radio talk shows.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phonetic, idiolectal and acoustic speaker recognition

This paper describes a text-independent speaker recognition system that achieves an equal error rate of less than 1% by combining phonetic, idiolect, and acoustic features. The phonetic system is a novel language-independent speakerrecognition system based on differences among speakers in dynamic realization of phonetic features (i.e., pronunciation), rather than spectral differences in voice q...

متن کامل

Comparative study of speaker personality traits recognition in conversational and broadcast news speech

Natural human-computer interaction requires, in addition to understand what the speaker is saying, recognition of behavioral descriptors, such as speaker’s personality traits (SPTs). The complexity of this problem depends on the high variability and dimensionality of the acoustic, lexical and situational context manifestations of the SPTs. In this paper, we present a comparative study of automa...

متن کامل

Detection of Non-Native Named Entities Using Prosodic Features for Improved Speech Recognition and Translation

In this work, we describe the use of acoustic-prosodic features to detect and localize non-native named entities spoken by a native speaker in the target language (English) for the purpose of improved speech recognition and translation. The exaggerated variation in accent and duration introduced by the speaker for non-native names is exploited in the detection process through the use of prosodi...

متن کامل

Speaker recognition based on idiolectal differences between speakers

“Familiar” speaker information is explored using non-acoustic features in NIST’s new “extended data” speaker detection task.[1] Word unigrams and bigrams, used in a traditional target/background likelihood ratio framework, are shown to give surprisingly good performance. Performance continues to improve with additional training and/or test data. Bigram performance is also found to be a function...

متن کامل

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003